178 research outputs found

    A Crowdsourcing Approach to Identify Common Method Bias and Self-Representation

    Get PDF
    Pertinent questions on the measurement of social indicators are: the verification of data gained online (e.g., controlling for self-representation on social networks), and appropriate uses in community management and policy-making. Across platforms like Facebook, LinkedIn, Twitter, and blogging services, users (sub)consciously represent themselves in a way which is appropriate for their intended audience (Qui et al., 2012; Zhao et al., 2008). However, scholars in the social sciences and computer science have not yet adequately addressed controlling for self-representation, or the propensity to display or censor oneself, in their analyses (Zhao et al., 2008; Das and Kramer, 2013). As such researchers on these platforms risk working with ‘gamified’, socially responding, or online disinhibitive (trolls) personas which goes above and beyond efforts to contain Common Method Biases (CMB) (Linville, 1985; Suler, 2004; Podsakoff et al., 2003). What has not been approached in a systematic way is the verification of such data on offline and actual personality. In this paper, we focus on the alignment of traditional survey methods with unobtrusive methods to gather profile data from online social media via crowdsourcing platforms

    The Trending Customer Needs (TCN) Dataset: A Benchmarking and Automated Evaluation Approach for New Product Development

    Get PDF
    In recent years, there have been many studies which summarize User Generated Content as lists of ranked keyphrases representing customer needs for the purposes of New Product Development. However, methods for the evaluation of keyphrase lists do not robustly assess solutions for these purposes. Therefore, in this paper we present the “Trending Customer Needs” (TCN) dataset of over 9000 top trending customer need keyphrases organized by month from 2007-2021 which spans 37 product categories in the area of Consumer Packaged Goods (e.g. toothpaste, eyeliner, beer etc.). TCN is a curated dataset for the benchmarking of supervised machine learning approaches in the prediction of customer needs using User Generated Content. We describe the process of curating TCN while ensuring its quality. Finally, we demonstrate its utility via a case study of Reddit discourse as a potential predictor for future customer needs in Consumer Packaged Goods

    Utilizing Social Media For Lead Generation

    Get PDF
    Social Media is the most prevalent platform for communication, forming and maintaining professional as well as social relationships. The growth of platforms and the exponential rise in the user base of social media websites like LinkedIn, Facebook and Twitter, is evidence of their widespread acceptance. They pose many opportunities for businesses to exploit this facet of digitally mediated relationships, for example spreading awareness about the business and engage with prospective customers. The focus of this research is on the use of social media to identify relevant profiles or ``leads\u27\u27 for a business in sourcing new employees, or collaborators. The paper utilizes data from social networking sites Twitter and LinkedIn and presents an automated approach for the discovery of leads. For the considered business cases, Twitter was found to be irrelevant for lead generation due to its emphasis on personal vs. professional user positioning. The presented final approach utilizes only four attributes from LinkedIn users\u27 profiles to generate high quality leads, and is tested for robustness to variations in input data, different business contexts and vulnerability to noise in the input data. The results show the robustness and consistency of the presented approach to generate leads despite utilizing a small subset of features

    Subnetwork ensembling and data augmentation: Effects on calibration

    Get PDF
    Deep Learning models based on convolutional neural networks are known to be uncalibrated, that is, they are either overconfident or underconfident in their predictions. Safety-critical applications of neural networks, however, require models to be well-calibrated, and there are various methods in the literature to increase model performance and calibration. Subnetwork ensembling is based on the over-parametrization of modern neural networks by fitting several subnetworks into a single network to take advantage of ensembling them without additional computational costs. Data augmentation methods have also been shown to enhance model performance in terms of accuracy and calibration. However, ensembling and data augmentation seem orthogonal to each other, and the total effect of combining these two methods is not well-known; the literature in fact is inconsistent. Through an extensive set of empirical experiments, we show that combining subnetwork ensemble methods with data augmentation methods does not degrade model calibration

    FBWatch: Extracting, Analyzing and Visualizing Public Facebook Profiles

    Get PDF
    An ever-increasing volume of social media data facilitates studies into behavior patterns, consumption habits, and B2B exchanges, so called Big Data. Whilst many tools exist for platforms such as Twitter, there is a noticeable absence of tools for Facebook-based studies that are both scalable and accessible to social scientists. In this paper, we present FBWatch, an open source web application providing the core functionality to fetch public Facebook profiles en masse in their entirety and analyse relationships between profiles both online and offline. We argue that FBWatch is a robust interface for social researchers and business analysts to identify analyze and visualize relationships, discourse and interactions between public Facebook entities and their audiences

    Is Quality Control Pointless?

    Get PDF
    Intrinsic to the transition towards, and necessary for the success of digital platforms as a service (at scale) is the notion of human computation. Going beyond ‘the wisdom of the crowd’, human computation is the engine that powers platforms and services that are now ubiquitous like Duolingo and Wikipedia. In spite of increasing research and population interest, several issues remain open and in debate on large-scale human computation projects. Quality control is first among these discussions. We conducted an experiment with three different tasks of varying complexity and five different methods to distinguish and protect against constantly under-performing contributors. We illustrate that minimal quality control is enough to repel constantly under-performing contributors and that this effect is constant across tasks of varying complexity

    On-demand distributed image processing over an adaptive Campus-Grid

    Get PDF
    This thesis explores how scientific applications, which are based upon short jobs (seconds and minutes) can capitalize upon the idle workstations of a Campus-Grid. These resources are donated on a voluntary basis, and consequently, the Campus-Grid is constantly adapting and the availability of workstations changes. Typically, to utilize these resources a Condor system or equivalent would be used. However, such systems are designed with different trade-offs and incentives in mind and therefore do not provide intrinsic support for short jobs. The motivation for creating a provisioning scenario for short jobs is that Image Processing, as well as other areas of scientific analysis, are typically composed of short running jobs, but still require parallel solutions. Much of the literature in this area comments on the challenges of performing such analysis efficiently and effectively even when dedicated resources are in use. The main challenges are: latency and scheduling penalties, granularity and the potential for very short jobs. A volunteer Grid retains these challenges but also adds further challenges. These can be summarized as: unpredictable re source availability and longevity, multiple machine owners and administrators who directly affect the operating environment. Ultimately, this creates the requirement for well conceived and effective fault management strategies. However, these are typically not in place to enable transparent fault-free job administration for the user. This research demonstrates that these challenges are answerable, and that in doing so opportunistically sourced Campus-Grid resources can host disparate applications constituted of short running jobs, of as little as one second in length. This is demonstrated by the significant improvements in performance when the system presented here was compared to a well established Condor system. Here, improvements are increased job efficiency from 60–70% to 95%–100%, up to a 99% reduction in application makespan and up to a 13000% increase in the efficiency of resource utilization. The Condor pool in use is approximately 1,600 workstations distributed across 27 administrative domains of Cardiff University. The application domain of this research is Matlab-based image processing, and the application area used to demonstrate the approach is the analysis of Magnetic Resonance Imagery (MRI). However, the presented approach is generalizable to any application domain with similar characteristics
    corecore